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Abstract 

The purpose of this closely aligned conceptual replication study was to investigate the efficacy 
of a Tier 2 kindergarten mathematics intervention. The replication study differed from the 
initial randomized controlled trial on three important elements: geographical region, timing 
of the intervention, and instructional context of the counterfactual. Similar to the original 
investigation, however, the current study tested the same intervention, used the same 
outcome measures and statistical analyses, and involved the same population of learners. 
A total of 319 kindergarten students with mathematics difficulties from 36 kindergarten 
classrooms participated in the study. Students who were randomly assigned to the treatment 
condition received the intervention in small-group formats, with 2 or 5 students per group. 
Control students participated in a no-treatment control condition. Significant effects on 
proximal and distal measures of mathematics achievement were found. Effect sizes obtained 
for all measures fell within or exceeded the upper bound of the effects reported in the initial 
study. Implications for systematically situating replication studies in larger frameworks of 
intervention research and reporting rates of treatment response across replication studies 
are discussed. 


Replication is a fundamental principle of 
scientific research (Flay et al., 2005; 
Gottfredson et al., 2015; Schmidt, 2009; 
Valentine et al., 2011). Replication studies 
allow the research community to rule out 
chance as a plausible explanation of previ¬ 
ous findings and build a convergence of 
empirical evidence in favor of an interven¬ 
tion or instructional practice (Coyne, Cook, 
& Therrien, 2016; Gottfredson et al., 2015). 
These studies also serve as a way to deter¬ 
mine if findings obtained in a previous study 
hold up to variations in settings and contex¬ 
tual factors (Coyne et al., 2013). Establish¬ 
ing evidence of replication, therefore, adds 
credibility and generalizability to the 
hypotheses and claims of the original 
research (Schmidt, 2009). 


Operationalizing Replication 
in Educational Research 

Within most scientific fields, replication stud¬ 
ies are categorized in two ways: direct replica¬ 
tion and conceptual replication (Coyne et al., 
2016; Makel, Plucker, & Flegarty, 2012; 
Schmidt, 2009). Direct replications are con¬ 
ducted using the same methods and under the 
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same conditions as the original research. In the 
field of education research, direct replications 
are difficult, if not impossible, to conduct given 
the complex and dynamic environments of 
schools. A more feasible option for educational 
researchers is to conduct closely aligned con¬ 
ceptual replications (Coyne et al, 2016). 
Closely aligned conceptual replications are 
conducted to verify whether findings general¬ 
ize across settings, conditions, and participants. 
Such replications typically vary from the origi¬ 
nal study on one or two elements (Schmidt, 
2009). If such variations are minimal, concep¬ 
tual replications can demonstrate similar capac¬ 
ity as direct replications. For example, a closely 
aligned replication can uncover whether treat¬ 
ment effects demonstrated in the original study 
replicate in a different geographical region 
(Coyne et al, 2013). 

Replication Research and the 
Changing Landscape of the 
Counterfactual 

Another aspect of replication studies concerns 
the control condition, or the counterfactual. 
The counterfactual represents what might have 
occurred had the treated sample not received 
the treatment (Lemons, Fuchs, Gilbert, & 
Fuchs, 2014; Shadish, Cooke, & Campbell, 
2002). Replication studies of beginning read¬ 
ing interventions have discovered that dimen¬ 
sions of the counterfactual change across time 
(Lemons et al., 2014) and vary by school and 
across different geographical regions (Coyne 
et al., 2013). Results from these studies suggest 
that an intervention implemented in a similar 
fashion to the same population may produce 
vastly different results based on the nature and 
strength of the counterfactual. Variability of the 
counterfactual therefore may change interpre¬ 
tation of observed treatment effects across a 
program of research (Lemons et al., 2014). For 
example, a study that establishes a mathemat¬ 
ics intervention as evidence based relative to a 
counterfactual could lose its treatment effect if 
the counterfactual shifts or strengthens in a 
subsequent replication study. 

Instructional dimensions of the counterfac¬ 
tual are commonly billed as “business-as-usual” 
(BAU) instruction. In many mathematics 


intervention studies, BAU instruction repre¬ 
sents the core mathematics instruction provided 
in general education settings. The aim of core 
mathematics instruction is to address the range 
of mathematics standards that students are 
expected to know at the end of each grade level 
(e.g., Common Core State Standards; National 
Governors Association Center for Best Prac¬ 
tices & Council of Chief State School Officers, 
2010). When core mathematics instruction rep¬ 
resents the counterfactual in mathematics inter¬ 
vention studies, its nature and strength is also 
expected to vary within and across classrooms. 
In part this is due to the diverse range of com¬ 
mercially available core mathematics programs 
and instructional approaches used to teach 
mathematics in U.S. classrooms (Morgan, Far- 
kas, & Maczuga, 2015). 

Replication studies allow the 
research community to rule out 
chance as a plausible explanation 
of previous findings and build a 
convergence of empirical evidence 
in favor of an intervention or 
instructional practice. 

In summary, the control condition or coun¬ 
terfactual is an essential component of inter¬ 
vention research as it allows researchers to 
rule out alternative explanations of a treat¬ 
ment effect. However, the nature and strength 
of the counterfactual can vary by location and 
improve across time. As evidenced by prior 
research, such variations of the counterfactual 
may attenuate observed treatment effects 
(Coyne et al., 2013; Lemons et al., 2014). 

Purpose of the Study 

The primary purpose of this study was to con¬ 
duct a closely aligned conceptual replication 
that tested the efficacy of the ROOTS mathe¬ 
matics intervention in schools that offered dif¬ 
ferent instructional and contextual dimensions 
than those included in a previous investiga¬ 
tion of the intervention (Clarke et al., 2016). 
Because ROOTS had demonstrated prelimi¬ 
nary treatment effects (Clarke et al., 2016), 
the current study sought to examine whether 
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the intervention’s efficacy would hold up to 
planned variations in geographical location, 
intervention timing, and instructional context 
of the counterfactual while holding constant 
other theoretically important variables (e.g., 
intervention dosage). A secondary purpose 
was to situate the current study within an a 
priori framework of systematic replication 
(Coyne et al., 2016). 

The ROOTS intervention is a 50-lesson 
Tier 2 kindergarten program designed to accel¬ 
erate the learning of kindergarten students 
with mathematics difficulties (MD). In this 
article, we use the term MD rather than math¬ 
ematics disability because it encompasses a 
broader range of students. Students with MD 
perform at or below the 35th percentile on 
standardized mathematics measures. Central 
to the ROOTS intervention is the integration 
of foundational concepts of whole number and 
validated principles of systematic and explicit 
mathematics instruction (Archer & Hughes, 
2010; Doabler & Fien, 2013; Gersten et ah, 
2009). Our initial efficacy trial of ROOTS, 
which occurred in the 2012-2013 school year, 
utilized a randomized block design (Clarke 
et ah, 2016). A total of 290 students from 37 
kindergarten classrooms in Oregon were ran¬ 
domly assigned within classrooms to one of 
three conditions: (a) a ROOTS-Small inter¬ 
vention group with a 2:1 student-teacher ratio 
(n = 58), (b) a ROOTS-Large intervention 
group with a 5:1 student-teacher ratio (n = 
145), or (c) a no-treatment (BAU) control con¬ 
dition (n = 87). Despite group size differences, 
students in both ROOTS groups received the 
same intervention dosage (i.e., 50 lessons, 5 
days per week, 20 min per day). Control stu¬ 
dents received only the core mathematics 
instruction provided in their kindergarten 
classrooms. Treatment students, however, 
received the ROOTS intervention in addition 
to their core mathematics instruction. 

To test the efficacy of ROOTS in Oregon, 
Clarke et ah (2016) aggregated treatment stu¬ 
dents in ROOTS-Small and ROOTS-Large 
intervention groups and compared their math¬ 
ematics achievement gains across the interven¬ 
tion time period to students in the control 
condition. Findings from the initial efficacy 
trial indicated statistically significant differences 


in favor of ROOTS students on four of the six 
outcome measures. Reported effect sizes 
(Hedges’ g) for the Test of Early Mathematics 
Ability-Edition 3 (TEMA-3; Ginsburg & 
Baroody, 2003), Assessing Student Proficiency 
in Early Number Sense (ASPENS; Clarke, 
Rolfhus, Dimino, & Gersten, 2012); ROOTS 
Assessment of Early Numeracy Skills 
(RAENS; Doabler, Clarke, & Fien, 2012), and 
oral counting (Clarke & Shinn, 2004) were 
0.32,95% confidence interval (Cl) [0.13,0.50]; 
0.58, 95% Cl [0.36, 0.79]; 0.75, 95% Cl [0.53, 
0.96]; and 0.28, 95% Cl [0.02, 0.54], respec¬ 
tively. Statistically significant differences 
between conditions were not found for the 
Number Sense Brief (NSB; Jordan, Glutting, & 
Ramineni, 2008), the Stanford Early School 
Achievement Test (SESAT; Harcourt Brace 
Educational Measurement 2003), and Stanford 
Achievement Test-lOth Edition (SAT-10; 
Harcourt Brace Educational Measurement, 
2002 ). 

Variability of the counterfactual 

therefore may change interpretation 

of observed treatment effects across 
a progt'am of research. 

We consider the current study as a closely 
aligned replication based on the degree of over¬ 
lap with the Oregon efficacy trial. For example, 
both studies employed a randomized block 
design and applied the same a priori criteria for 
determining students’ eligibility for the inter¬ 
vention. In both studies, the 10 lowest-perform¬ 
ing kindergarten students from each classroom 
who met the eligibility criteria were randomly 
assigned within classrooms to the same three 
experimental conditions. In addition, similar to 
the Oregon study, the replication had district- 
employed personnel deliver the ROOTS inter¬ 
vention in small-group formats and at the same 
dose frequency (i.e., 50 lessons, 20 min per day, 
5 days per week for approximately 10 weeks). 
Analyses used in both studies aggregated stu¬ 
dents in the ROOTS groups to compare gains in 
mathematics outcomes relative to students in 
the control condition. 

Notwithstanding strong overlap with the 
research design, analytic procedures, and 


Downloaded from ecx.sagepub.com at UNIV OF OREGON on October 10, 2016 




Doabler et al. 


95 


intervention dosage levels, the current study 
varied from the Oregon efficacy trial on several 
important contextual and instructional dimen¬ 
sions. One difference between the two studies 
was geographical location. Whereas Clarke 
et al. (2016) investigated ROOTS in suburban 
and rural schools from across Oregon, the cur¬ 
rent study was conducted in urban and subur¬ 
ban schools from the metropolitan area of 
Boston, Massachusetts. It is important to note 
that in the current study, we purposefully var¬ 
ied the geographical region to determine 
whether the effects of ROOTS held up in class¬ 
rooms with different instructional contexts and 
students with different sociodemographic char¬ 
acteristics. For example, schools that partici¬ 
pated in the replication had a more diverse 
demography and served larger percentages of 
students from economically disadvantaged 
backgrounds than the Oregon schools. Onset of 
the intervention in the Boston study also 
occurred at a different time point in the school 
year than the original efficacy trial. Students in 
the Oregon study received ROOTS starting in 
mid-January; however, the replication began 
nearly 2 months earlier (mid-November). We 
altered the onset of the intervention to see if an 
earlier start time in the school year was more 
effective for at-risk kindergarteners and would 
better align with students’ acquisition of early 
number sense. 

In addition, the core mathematics programs 
used in the replication study’s kindergarten 
classrooms (i.e., control condition) differed 
from those delivered in the Oregon study. The 
replication’s programs were also noted as 
having a more robust evidence base for 
improving student mathematics achievement. 
Recognizing these contextual and instruc¬ 
tional differences, the new research sites were 
expected to offer a unique counterfactual. 
Thus, positive findings from the replication 
would increase the credibility and generaliz- 
ability of the intervention’s beneficial impact 
on student mathematics outcomes. 

Method 

This replication of the efficacy evaluation of 
ROOTS was examined in a partially nested 
randomized controlled trial (RCT) described 


by Bauer, Sterba, and Hallfors (2008) and 
Baldwin, Bauer, Stice, and Rohde (2011). The 
10 lowest-performing students from each par¬ 
ticipating kindergarten classroom were ran¬ 
domly assigned to one of three conditions: (a) a 
ROOTS instructional group with a 2:1 student- 
teacher ratio, (b) a ROOTS instructional group 
with a 5:1 student-teacher ratio, or (c) a no¬ 
treatment control condition. The study is par¬ 
tially nested because the intervention students 
were nested within interventionists’ small 
groups whereas control students were not. Stu¬ 
dents randomly assigned to the two treatment 
groups received the ROOTS intervention in 
addition to district-approved core mathematics 
instruction. Students in the control condition 
received district-approved core mathematics 
instruction only. This design requires a specific 
analysis approach to account for clustering, 
which we describe later. 

The study was designed to allow for four 
replications of the efficacy of ROOTS. The 
potentially important difference between the 
two treatment groups (i.e., 2:1 ROOTS-Small 
vs. 5:1 ROOTS-Large) was a key aim of the 
full, 4-year efficacy trial. Because the differ¬ 
ences between students in the two interven¬ 
tion groups was expected to be smaller than 
those between intervention and control, this 
replication study, by itself, is underpowered to 
test questions about group size. Students in 
the ROOTS groups were combined to com¬ 
pare their gains on important mathematics 
outcomes relative to students in the no-treat¬ 
ment control condition. Future research will 
investigate the effect of grouping within the 
ROOTS intervention. 

Participants 

The principal investigators and key personnel 
of the ROOTS project conducted all recruit¬ 
ment efforts. These efforts entailed contacting 
district leaders of public school districts in the 
metropolitan area of Boston, Massachusetts. 
District leaders were provided information on 
the study’s research aims and activities. Inter¬ 
ested district leaders then identified potential 
schools for participation, namely, those that 
contained large percentages of students in 
need of intensive instructional support in 
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mathematics. Schools targeted for recruitment 
were those that received Title 1 funding. Prin¬ 
cipals and the kindergarten teachers of schools 
were then contacted. All kindergarten teachers 
in each participating school were eligible to 
participate in the study. 

Schools. This study took place in 36 kinder¬ 
garten classrooms across two school districts 
in the metropolitan area of Boston, Massachu¬ 
setts. Districts A and B had total enrollments 
of 6,118 and 6,843 students, respectively. 
Nine schools participated from these two dis¬ 
tricts. In District A, all kindergarten students 
attended the same school, whereas eight sepa¬ 
rate schools from District B participated in the 
study. 

Classrooms. Of the 36 classrooms, 32 provided 
a full-day kindergarten program, and four 
classrooms provided a half-day kindergarten 
program. All classrooms provided 5 days per 
week of core mathematics instruction in Eng¬ 
lish and had an average of 23.7 students (SD = 
6.1). Participating classrooms were taught by 
36 certified kindergarten teachers. One teacher 
went on maternity leave during the course of 
the intervention, and the remaining 35 teachers 
participated for the duration of the study. Of 
the 36 teachers, 32 teachers provided the fol¬ 
lowing demographic information. All teachers 
were female, and the majority of teachers were 
White (91%). Teachers had an average of 15 
years (SD = 8.68) of teaching experience and 
10 years (SD = 7.29) of experience teaching at 
the kindergarten level. Of the 32 teachers, 66% 
held a graduate degree in education, and 56% 
had previously completed at least one graduate 
course in algebra. 

Interventionists. Similar to the Oregon effi¬ 
cacy trial, ROOTS groups were taught by 
instructional assistants who were employed 
by the two participating Boston districts. All 
interventionists were female, and 62% had a 
bachelor’s degree or higher. Among the inter¬ 
ventionists, 91% had prior experience with 
providing small-group instruction, 53% had 
taken at least one college level course in alge¬ 
bra, 56% were White, and 15% reported their 
ethnicity as Hispanic. 


Criteria for participation. The current study 
applied the same process for determining stu¬ 
dents’ eligibility for the intervention as the 
Oregon efficacy trial. To determine eligibility 
for ROOTS, all participating students from 
the 36 classrooms, who had parental consent, 
were screened on the ASPENS (Clarke, Rolf- 
hus, et al., 2012) and NSB (Jordan et al., 
2008). Students with both an NSB score of 20 
or less and an ASPENS composite score in 
the “strategic” or “intensive” ranges were 
considered at risk for MD and eligible for the 
intervention. The ASPENS and NSB scores 
of ROOTS eligible students were separately 
converted into standard scores and then com¬ 
bined to form an overall composite standard 
score. In each classroom, ROOTS-eligible 
students’ composite standard scores were 
rank ordered. The 10 ROOTS-eligible stu¬ 
dents with the lowest composite standard 
scores were randomly assigned to one of 
three conditions: (a) a ROOTS-Small (2:1) 
group, (b) a ROOTS-Large (5:1) group, or (c) 
a no-treatment BAU control condition. Thus, 
in each classroom, two students were in the 
ROOTS-Small group, five students were in 
the ROOTS-Large group, and three were in 
the control condition. 

Of the 36 classrooms, 26 had at least 10 
ROOTS-eligible students per classroom. After 
random assignment, these 26 classrooms pro¬ 
vided 52 ROOTS groups (n = 26 ROOTS- 
Small, n = 26 ROOTS-Large). For the 10 
classrooms that had fewer than 10 ROOTS- 
eligible students, we applied a cross-class 
grouping procedure. For example, in one 
school, we combined two half-day session 
classrooms and then randomly assigned the 
10 ROOTS-eligible students to one of three 
conditions: ROOTS-Small, ROOTS-Large, 
or control. A total of seven ROOTS-Small 
groups, seven ROOTS-Large groups, and six 
control groups were formed using the cross¬ 
class grouping procedure. In four of these 
combined classrooms, the ROOTS-Large 
group had four students instead of five, as 
only nine eligible students were available 
after combining classrooms. In one of the 
combined classrooms, a control group could 
not be formed, as only six students in the 
classroom were eligible for ROOTS. In all, 
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the 36 classrooms provided a total of 66 
ROOTS groups (n = 33 ROOTS-Small, n = 
33 ROOTS-Large). 

Students. A total of 878 kindergarten students 
were screened for ROOTS eligibility in the 
late fall of 2013. Of these students, 319 met 
the ROOTS eligibility criteria and were ran¬ 
domly assigned to the ROOTS-Small condi¬ 
tion ( n = 67), the ROOTS-Large condition 
(n = 162), or the control condition (n = 90). 
Similar to the initial ROOTS efficacy trial, the 
current study combined students in the two 
ROOTS conditions (small and large) in order 
to assess the effects of ROOTS intervention as 
compared to the control condition. Demo¬ 
graphic information for all ROOTS-eligible 
students is presented in Table 1. 

Procedures 

ROOTS intervention. ROOTS is a Tier 2 kinder¬ 
garten mathematics intervention program that 
consists of 50 lessons delivered in 20-min ses¬ 
sions. Similar to the initial study, ROOTS was 
delivered in small-group formats (two or five 
students per group), 5 days per week for approx¬ 
imately 10 weeks. However, unlike the initial 
study, the onset of the ROOTS intervention in 
the Boston study began in November and ended 


Table I. Descriptive Statistics for Student 
Characteristics by Condition. 


Student characteristic 

ROOTS 

Control 

Age at pretest, M (SD) 

5.2 (0.4) 

5.2 (0.4) 

Male 

47% 

58% 

Race 

American Indian/Alaskan 

0% 

1% 

Native 

Asian 

1% 

2% 

Black 

6% 

7% 

Native Hawaiian/Pacific 

0% 

0% 

Islander 

White 

91% 

84% 

More than one race 

2% 

6% 

Hispanic 

49% 

51% 

Limited English proficiency 

23% 

28% 

SPED eligible 

8% 

14% 


Note. The sample included 229 students in the ROOTS 
condition and 90 students in the control condition. 
SPED = special education. 


in March of the 2013-2014 school year, which 
was an earlier start by nearly 2 months. It is 
important to note that the intervention occurred 
at times that did not conflict with students’ core 
mathematics instmction. 

The ROOTS intervention is designed to 
promote procedural fluency and conceptual 
understanding in whole-number concepts and 
skills. This focus is consistent with the Com¬ 
mon Core State Standards for mathematics 
(CCSS-M; National Governors Association 
Center for Best Practices & Council of Chief 
State School Officers, 2010) and calls from 
expert panels (Gersten et al., 2009). Specifi¬ 
cally, ROOTS instruction prioritizes concepts 
from the Counting and Cardinality and Opera¬ 
tions and Algebraic Thinking domains of the 
CCSS-M. This deep focus on whole-number 
concepts is designed to help struggling stu¬ 
dents develop a robust number sense. ROOTS 
facilitates mathematics proficiency by priori¬ 
tizing principles of explicit and systematic 
instruction. For example, lessons judiciously 
include essential features of explicit mathe¬ 
matics instruction, such as teacher modeling, 
deliberate practice, visual representations of 
mathematics, and academic feedback. 
ROOTS also incorporates opportunities for 
students to verbalize their mathematical think¬ 
ing and discuss problem-solving methods. 
More information on ROOTS is described in 
Clarke et al. (2016). 

Professional development. Similar to the Ore¬ 
gon efficacy trial, all participating interven¬ 
tionists received two 5-hr professional 
development workshops delivered by project 
staff. The first workshop focused on the 
instructional objectives and content of Les¬ 
sons 1 to 25, empirically validated instruc¬ 
tional practices in mathematics, and 
small-group management techniques. The 
second workshop focused on Lessons 26 to 
50. Both workshops incorporated opportuni¬ 
ties for interventionists to practice and receive 
feedback on lesson delivery from coaches and 
project staff. During intervention implemen¬ 
tation, all interventionists received instruc¬ 
tional support from two ROOTS coaches. The 
two coaches were former educators with spe¬ 
cialized knowledge and training in effective 
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early mathematics instruction and small- 
group instructional practices. Coaching visits 
included direct observations of lesson deliv¬ 
ery followed by feedback on the quality of 
instruction as well as fidelity of intervention 
implementation. 

To ensure a high level of implementation 
fidelity, the number of coaching visits varied 
based on interventionists’ implementation 
needs. Although interventionists received an 
average of 2.3 (SD = 0.98) coaching visits 
over the course of the study, 22 intervention¬ 
ists demonstrated implementation difficulties 
and thus received three or four coaching vis¬ 
its. The remaining interventionists had no dif¬ 
ficulties with the intervention and thus 
received one or two visits. 

Control condition. Core (Tier 1) math instruc¬ 
tion delivered in the kindergarten classroom 
served as the control condition or counterfac- 
tual. All ROOTS (small and large) and control 
students received daily core mathematics 
instruction. The control condition was docu¬ 
mented through teacher surveys and direct 
observations of core mathematics instruction. 
Observation and survey data indicated that 
classrooms in District A primarily used the 
Scott Foresman mathematics curriculum dur¬ 
ing core math instruction, whereas classrooms 
in District B primarily used the enVisionMath 
curriculum. Based on the level of available 
evidence, the What Works Clearinghouse 
(WWC; n.d.) has rated both of these core 
mathematics programs as effective for 
improving student mathematics achievement. 
Teachers also reported that they supplemented 
these curricula with their own materials. 
Observations indicated that ROOTS interven¬ 
tion materials were not used during core 
mathematics instruction. 

Kindergarten classroom teachers reported 
that they provided an average of 46 min per 
day of core mathematics instruction (SD = 
13.64). Survey data also indicated that stu¬ 
dents received whole-number instruction dur¬ 
ing calendar time. All teachers reported that 
core instruction included activities on count¬ 
ing and cardinality as well as numbers and 
operations in base 10. The majority of teach¬ 
ers (79%) indicated that knowing number 


names and the count sequence was the main 
instructional priority when teaching whole- 
number concepts and skills, and 17.2% 
reported that their core instruction empha¬ 
sized counting to tell the number of objects. 
All teachers provided teacher-led instruction, 
and the majority of teachers also reported that 
core mathematics instruction included peer or 
group work, independent student work, and 
learning centers. The majority of the teachers 
reported that small-group formats (93%) and 
individualized instruction (66%) were used 
during core mathematics instruction. Finally, 
teachers reported that they regularly incorpo¬ 
rated principles of explicit instruction, such as 
demonstrations of mathematics concepts, 
visual representations, and opportunities for 
mathematics verbalizations. 

In each participating classroom, trained 
research staff conducted direct observations 
of core mathematics instruction. Observation 
data indicated that all teachers provided 
teacher-led mathematics instruction, with 
some teachers providing a variety of other 
learning opportunities, such as independent 
practice, peer and small-group activities, and 
technology-based activities. The primary 
mode of instructional delivery was teacher- 
led instruction (92%), and nearly 75% of core 
instruction focused on operations and alge¬ 
braic thinking (e.g., addition and subtraction), 
whereas 17.9% of instructional periods pri¬ 
marily focused on counting. The majority of 
observations showed clear evidence of the 
following principles of explicit and system¬ 
atic instruction (Gersten et al., 2009): aca¬ 
demic feedback, visual representations, 
teacher demonstrations, guided practice 
opportunities, and opportunities for students 
to verbalize their mathematical thinking. 
Flowever, observations documented that 
teachers were less likely to provide indepen¬ 
dent and written mathematics practice and 
scaffolded instruction for struggling students. 

Fidelity of Implementation 

In order to determine the extent to which the 
ROOTS intervention was delivered as 
intended, fidelity of ROOTS implementation 
was directly observed by trained research 
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staff. Each ROOTS group was observed 
three times over the course of the interven¬ 
tion. A total of 190 observations of imple¬ 
mentation fidelity were conducted, of which 
31 included two observers. It is important to 
note that these observations were separate 
from the fidelity observations conducted by 
the ROOTS coaches. For these fidelity 
observations, observers used a 4-point scale 
(4 = all, 3 = most, 2 = some, 1 = none) to rate 
the extent to which interventionists met the 
lesson’s instructional objectives, followed 
the provided teacher scripting, and used the 
prescribed math models for that lesson. 
Observers also recorded the number of pre¬ 
scribed activities delivered during the lesson. 
Interventionists were observed to deliver the 
majority of the prescribed activities (M = 
4.26 out of 5 activities, SD = 0.34). Observ¬ 
ers also noted that interventionists met les¬ 
son objectives (M = 3.46, SD = 0.46), 
followed teacher scripting (M = 3.37, SD = 
0.48), and used prescribed math models (M= 
3.58, SD = 0.42). 

To describe interobserver reliability for the 
fidelity measures, intraclass correlation coef¬ 
ficients (ICCs) were calculated. The ICC for 
the aggregate fidelity score was substantial 
(0.87). Further, ICCs for individual fidelity 
ratings indicated substantial to perfect agree¬ 
ment: (a) 1.0 for number of activities deliv¬ 
ered, (b) 0.68 for met math objectives, (c) 
0.79 for followed teacher scripting, and (d) 
0.87 for used prescribed math models (Landis 
& Koch, 1977). 

ROOTS coaches also used direct observa¬ 
tions to rate fidelity of implementation. Each 
ROOTS group was observed and rated 
between two and four times. During these 
observations, coaches rated the quality of stu¬ 
dent-teacher interactions and fidelity of imple¬ 
mentation on separate 5-point scales (1 = low, 
3 = medium, 5 = high) as well as the duration 
of each lesson in minutes. On average, ROOTS 
lessons were delivered in 22.36 min (SD = 
3.87). Coaches rated overall implementation 
fidelity in the medium to medium-high range, 
with variability across interventionists (M = 
3.72, SD = 0.92). Quality of student-teacher 
interactions was also rated in the medium-high 
range (M= 4.02, SD = 0.86). 


Intervention dosage was also recorded as 
an indicator of implementation fidelity. Of the 
66 ROOTS groups, 62 groups provided a full 
or nearly full dose of the intervention (98% to 
100% of the lessons). Four of the groups did 
not provide information on lesson comple¬ 
tion. 

Measures 

All treatment and control students were 
administered five measures of whole-number 
understanding at pretest (Tl) and posttest 
(T2). These measures consisted of one proxi¬ 
mal measure that measured skills taught in 
ROOTS, a set of curriculum-based measures 
that assessed discrete skills related to early 
number sense, and two distal measures of 
whole-number understanding. An additional 
distal outcome measure was administered 
approximately 6 months into students’ first- 
grade year (T3). Trained research staff admin¬ 
istered all student measures. Interscorer 
reliability criteria were met for all assess¬ 
ments (i.e., >. 95). 

RAENS (Doabler et al., 2012) is a 
researcher-developed instrument that was 
administered at Tl and T2 time periods. 
RAENS is individually administered and con¬ 
sists of 32 items. Items assess aspects of count¬ 
ing and cardinality, number operations, and the 
base-10 system. In an untimed setting, stu¬ 
dents are asked to count and compare groups 
of objects; write, order, and compare numbers; 
label visual models (e.g., 10-frames); and 
write and solve single-digit addition expres¬ 
sions and equations. RAENS’s predictive 
validity ranges from .68 to .83 for the TEMA-3 
and the NSB. Interrater scoring agreement is 
reported at 100% (Clarke et al., 2016). 

Oral counting-early numeracy curriculum- 
based measurement (Clarke & Shinn, 2004), a 
curriculum-based measure, which was admin¬ 
istered at Tl and T2, has students orally count 
in English for 1 min, and the discontinue 
rule applies after the first counting error. 
The highest correct number counted repre¬ 
sents a student’s score. Fall-of-kindergarten 
oral counting skills have predictive validity 
with SAT-10 performance in the spring of 
kindergarten as well as high interscorer and 
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test-retest reliability. Test-retest reliability and 
alternate-form reliability are reported at above 
.80, concurrent validity is reported as ranging 
from .49 to .70, and predictive validity with 
the Woodcock-Johnson Applied Problems 
subtest (Woodcock & Johnson, 1989) and the 
Number Knowledge Test (Okamoto & Case, 
1996) is reported as ranging from .46 to .72 
(Clarke et al.,2011). 

ASPENS (Clarke et al., 2012) is a set of 
three curriculum-based measures validated 
for screening and progress monitoring in 
kindergarten mathematics (Clarke et al., 
2012). Each 1-min fluency-based measure 
assesses an important aspect of early numeracy 
proficiency, including number identification, 
magnitude comparison, and missing number. 
Test-retest reliabilities of kindergarten ASPENS 
measures are in the moderate to high range (.74 
to .85). Predictive validity of fall scores on the 
kindergarten ASPENS measures with spring 
scores on the TerraNova 3 is reported as ranging 
from .45 to .52 (Clarke et al., 2012). ASPENS 
was administered at T1 and T2. 

NSB ( Jordan et al., 2008) is an individually 
administered measure with 33 items that 
assess counting knowledge and principles, 
number recognition, number comparisons, 
nonverbal calculation, story problems, and 
number combinations. NSB, which was 
administered at T1 and T2, has a coefficient 
alpha of .84 (Jordan et al., 2008). 

TEMA-3 (Ginsburg & Baroody, 2003) is a 
standardized, norm-referenced, individually 
administered measure of beginning mathe¬ 
matical ability. The TEMA-3 assesses whole- 
number understanding for children ranging in 
age from 3 to 8 years 11 months. Alternate- 
form and test-retest reliabilities of the 
TEMA-3 are .97 and .93, respectively. The 
TEMA-3, which was administered at T1 and 
T2, has concurrent validity with other mathe¬ 
matics measures ranging from .54 to .91. 

The SAT-10 (Harcourt Brace Educational 
Measurement, 2002) and the SESAT (Harcourt 
Brace Educational Measurement, 2003) are 
both group-administered, standardized, norm- 
referenced measures. Both measures have two 
mathematics subtests: Problem Solving and 
Procedures. The internal consistency for the 
SESAT is .88. Internal consistency reliabilities 


for the SAT-10 mathematics subtests range 
from .80 to .90. All treatment and control stu¬ 
dents were administered the SESAT at post¬ 
test (T2) and the SAT-10 midway through 
their first-grade year (T3). 

Statistical Analysis 

We assessed intervention effects on each of 
the primary outcomes with a mixed model 
(multilevel) Time x Condition analysis (Mur¬ 
ray, 1998) designed to account for students 
partially nested within small groups (Baldwin 
et al., 2011; Bauer et al., 2008). The study 
design called for the randomization of indi¬ 
vidual students to receive ROOTS, nested 
within small groups, or a non-nested compari¬ 
son condition, and the analytic model must 
account for the potential heterogeneity among 
variances across conditions (Roberts & Rob¬ 
erts, 2005). In particular, the ROOTS groups 
required a group-level variance, whereas the 
unclustered controls did not. Further, because 
the residual variances may have differed 
between conditions, we tested the assumption 
of homoscedasticity of residuals. 

Baldwin et al. (2011) and Bauer et al. 
(2008) presented a mixed-model analysis-of- 
variance approach to account for the different 
variance structures between conditions and 
tests for heteroscedastic residual variances, 
and we expand their approach to a Time x 
Condition analysis. The analysis tests for dif¬ 
ferences between conditions on gains in out¬ 
comes from the fall (Tl) to spring (T2) of 
kindergarten. The basic model includes time, 
T, coded 0 at Tl and 1 at T2; condition, C, 
coded 0 for control and 1 for ROOTS; and the 
interaction between the two: 


Y ij = -o. ; + "1 jCj + 


(1) 

+ n ijT]Cj +e ij 

e t j~ N(0, ct 2 ) 

K 0j = 

Poo 

(2) 

"i. / = 

PlO 

(3) 

n 2 j = 

P 20 

(4) 


n 3j =P30 +r 3y r 3y~N(0, T 2 ) (5) 
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For Level 1, the first equation, L represents 
a score for individual i within cluster j, and the 
model includes condition, C-, time, 7V; and 
their interaction as predictors. Although the e i} 
are distributed N{ 0, o 2 ), the Time x Condition 
analysis decomposes the individual-level vari¬ 
ance, o 2 , into a variance for the assessments, 
2 

c s , and covariance between the T1 and T2 
assessments, r s , where ct) , and r~ sum to o 2 
(Murray, 1998). For simplicity, however, we 
focus on a 2 for the following discussion but 
present both the variance and covariance terms 
in the results. Because individual students 
were assigned to condition and only partially 
nested, the model differs in two ways from 
models used for fully clustered randomized 
trials (FCRT). First, the cluster j has a unique 
value for each group in the intervention condi¬ 
tion and a unique value for each individual stu¬ 
dent in the control condition. Second, where 
condition typically resides at the cluster level 
in FCRT, in this partially clustered trial we 
include condition at the individual level. 

The Level 2 equations predict scores 
with (a) an intercept, 7t (l/ , which represents 
the pretest control group mean; (b) the dif¬ 
ference between conditions at pretest, it , 

(c) the slope for control students, ji 2/ ; and 

(d) the difference between conditions on the 
slope, n 3j . The intercept, condition effect at 
pretest, and slope for control students do 
not require cluster variances because they 
represent either unclustered control group 
effects or differences between conditions at 
pretest, before students were clustered. The 
model included a cluster-level variance, r,., 
for the Time x Condition effect to account 
for the posttest clustering that occurs only 
in the intervention condition. The following 
composite equation is obtained by substitut¬ 
ing the Level 2 equations into the Level 1 
equation: 

Yy ~ Poo + P,oC ; + Mj 

+ K T ifj + {j'ijC/’ij + e tj ) 

Due to the unbalanced nesting structure, the 
residual variance may differ by condition, so 
we fit two models. The homoscedastic model 
shown in Equation 6 assumed a single residual 


error term. The second model assumed het- 
eroscedastic residual variances as follows: 


v (% 

1 Cj= 0) =Og 

(7) 

V P» 

\Cj= i) =nr 

(8) 


2 2 

As described previously, both cTq and ctf 
decompose into a variance and covariance in 
the analyses. 

We tested whether the homoscedastic and 
heteroscedastic models could be assumed 
equivalent with a likelihood ratio test and 
reported the simpler model if we were able to 
accept the equivalence of the two models. 
Because we test for equivalence, or the nonin¬ 
feriority, of the simpler model when compared 
to the more complex model, we must reverse 
the null and alternative hypotheses and, hence, 
the a and P values that represent Type 1 and 
Type II error rates as we might in equivalence 
or noninferiority trials (e.g., Dasgupta, Law- 
son, & Wilson, 2010; Piaggio, Elboume, Alt¬ 
man, Pocock, & Evans, 2006). For this reason, 
as well as the low statistical power to detect 
differences between variance structures 
(Kromrey & Dickenson, 1996), we compare 
models with likelihood ratio test using a = .20 
as our criterion Type I error rate and, as a 
consequence, report the more complex model 
unless we are relatively certain the two are 
equivalent. 

These models test for net differences 
between conditions (Murray, 1998), which 
provide an unbiased and straightforward 
interpretation of the results (Allison, 1990; 
Jamieson, 1999). For two outcomes, the 
SESAT available only at posttest and the SAT- 
10 collected as a follow-up measure in Grade 
1, we used the analysis-of-covariance 
approach described by Bauer et al. (2008) and 
Baldwin et al. (2011). For the analysis-of- 
covariance approach, we also compared 
homoscedastic and heteroscedastic models 
with likelihood ratio tests. In all models, based 
on recommendations of Baldwin et al., we 
used Satterthwaite approximation to deter¬ 
mine the degrees of freedom. 

Because students were randomly assigned 
within classrooms and schools, we tested an 
additional set of models that extended those 
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discussed earlier to account for clustering 
within classrooms or schools. Aligned with 
the observations of Raudenbush and Sadoff 
(2008), the overall pattern of intervention 
results remained similar in all models, whether 
or not we included classroom or school levels 
in the model. Condition effects did not vary 
by classroom or school, either, so we omitted 
these results. 

Model estimation. We fit models to our data 
with SAS PROC MIXED Version 9.2 (SAS 
Institute, 2009) using restricted maximum 
likelihood, generally recommended for 
multilevel models (Hox, 2002). Maximum 
likelihood estimation for the Time x Condi¬ 
tion analysis uses of all available data to 
provide potentially unbiased results even in 
the face of substantial attrition, provided the 
missing data were missing at random (Gra¬ 
ham, 2009). In the present study, we did not 
believe that attrition or other missing data 
represented a meaningful departure from 
the missing-at-random assumption, mean¬ 
ing that missing data did not likely depend 
on unobserved determinants of the out¬ 
comes of interest (Little & Rubin, 2002). 
The majority of missing data involved stu¬ 
dents who were absent on the day of assess¬ 
ment or left the school. 

The models assume independent and nor¬ 
mally distributed observations. We addressed 
the first, more important assumption (van 
Belle, 2008) by explicitly modeling the multi¬ 
level nature of the data. The data in the pres¬ 
ent study also do not markedly deviate from 
normality; skewness and kurtosis fell with 
±2.0 for all measures except for oral counting, 
where kurtosis was 2.3. However, multilevel 
regression methods have been found quite 
robust to violations of normality (e.g., Hannan 
& Murray, 1996). 

Effect sizes. To ease interpretation, we com¬ 
puted an effect size, Hedges’ g (Hedges, 
1981), for each fixed effect. Hedges’ g, rec¬ 
ommended by the WWC (2014), represents 
an individual-level effect size comparable to 
Cohen’s d (Cohen, 1988; Rosenthal & Ros- 
now, 2008). 


Results 

Table 2 presents means, standard deviations, 
and sample sizes for the seven dependent vari¬ 
ables by condition and assessment time. Over¬ 
all, 35% of the sample scored below the 10th 
percentile on the TEMA-3 at pretest (38% of 
control, 34% of ROOTS). At posttest, 22% of 
the sample remained below the 10th percen¬ 
tile (32% of control, 17% of ROOTS). 

Attrition 

Student attrition was defined as students with 
data at T1 but missing data at T2. Attrition 
rates varied between 6%, for the TEMA-3 
and RAENS, and 8%, for NSB, ASPENS, 
and oral counting. Only 5% of students were 
missing all posttest data. The proportion of 
students missing all posttest data did not dif¬ 
fer between conditions, / 2 (1) = 3.61, p = 
.0575. Although differential rates of attrition 
are undesirable, differential scores on math 
tests present a far greater threat to validity, so 
we conducted an analysis to test whether stu¬ 
dent math scores were differentially affected 
by attrition across conditions. We examined 
the effects of condition, attrition status, and 
their interaction on pretest scores for all five 
measures available at pretest. We found no 
statistically significant interactions for any of 
the outcome measures (p > .2333) and no pre¬ 
test differences between conditions for stu¬ 
dents with posttest data (see Table 3). The 
primary impact analyses incorporated all 
available data, further reducing the likelihood 
of bias (Graham, 2009). 

Efficacy Effects for ROOTS 

Table 3 presents the results of the statistical 
models. The table presents the results of the 
homoscedastic model for all outcomes, as it 
was deemed equivalent to the more compli¬ 
cated heteroscedastic model. The bottom two 
rows of the table show the likelihood ratio 
test results that compared homoscedastic 
residuals to heteroscedastic residuals. 
Although the variance structures differed 
between these models, the condition effect 
estimates and statistical significance values 
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Table 2. Descriptive Statistics for Mathematics Measures by Condition and Assessment Time. 



T, 


t 2 

t 3 

Measure 

ROOTS 

Control 

ROOTS 

Control ROOTS Control 

NSB 

M (SD) 

12.66 (3.71) 

1 1.89 (3.35) 

19.71 (4.90) 

17.01 (4.81) 

n 

229 

90 

208 

86 

ASPENS 

M (SD) 

21.67 (17.06) 

17.80 (14.85) 

89.10 (33.60) 

63.97 (35.00) 

n 

225 

90 

208 

86 

Oral counting 

M (SD) 

19.80 (12.93) 

18.20 (1 1.49) 

45.83 (21.56) 

41.31 (21.41) 

n 

229 

90 

207 

85 

TEMA-3 

M (SD) 

17.08 (7.03) 

16.23 (6.67) 

26.48 (7.62) 

23.27 (8.06) 

n 

227 

88 

213 

88 

RAENS 

M (SD) 

1 1.83 (5.74) 

11.39 (5.74) 

24.31 (6.01) 

17.34 (5.93) 

n 

228 

89 

213 

88 


SESAT Total 

M (SD) 463.75 (38.51) 451.90 (34.24) 

n 200 82 

SAT-10 Total 

M (SD) 497.09 (29.07) 495.45 (27.55) 

n 186 77 


Note. NSB = Number Sense Brief (Jordan et al., 2008); ASPENS = Assessing Student Proficiency in Early Number 
Sense (Clarke et al., 2012); TEMA-3 = Test of Early Mathematics Ability (3rd ed.; Ginsburg & Baroody, 2003); 
RAENS = ROOTS Assessment of Early Numeracy Skills (Doabler et al., 2012); SESAT = Stanford Early School 
Achievement Test (Harcourt Brace Educational Measurement, 2003); SAT-10 = Stanford Achievement Test Series 
(10th ed.; Harcourt Brace Educational Measurement, 2002). The sample sizes represent students with a particular 
measure at each assessment period. 


were very similar for both the heteroscedastic 
and homoscedastic models. We also tested 
models with an additional level for either 
classrooms or schools. The overall pattern of 
residts remained very similar in these models 
as well, so we did not present the results from 
these models. Overall, the results suggest that 
the intervention effects were not particularly 
sensitive to the variance structures. 

The models in Table 3 tested fixed effects 
for differences between conditions at pretest 
(condition effect), gains across time, and the 
interaction between the two. We found no sta¬ 
tistically significant differences at pretest (p > 
.1573, Hedges’ g < 0.16 for all measures), 
which suggested that ROOTS and control 
students were similar in the fall of kindergar¬ 
ten. We found statistically significant differ¬ 
ences by condition in gains from fall to spring 


for five dependent variables. Students in the 
ROOTS condition made greater gains than 
control students on the NSB (t = 3.15, df= 94, 
p = .0022), ASPENS (f = 5.60, df= 118 ,p< 
.0001), TEMA-3 standard scores (f = 3.45, df= 
99, p = .0008), and RAENS (t = 9.20, df= 126, 
p < .0001). Students in the ROOTS condition 
also had higher covariate-adjusted SESAT 
scores using ASPENS and TEMA-3 as pretest 
covariates (t = 2.45, df= 136 ,p = .0158). We 
did not detect statistically significant differ¬ 
ences between conditions in gains on oral 
counting or differences between conditions 
on covariate adjusted SAT-10 scores with 
ASPENS and TEMA-3 as pretest covariates. 
The Time x Condition model estimated differ¬ 
ences in gains between conditions of 1.94 for 
the NSB (g = 0.40), 21.78 for the ASPENS 
(g = 0.64), 2.43 for the TEMA-3 standard score 


Downloaded from ecx.sagepub.com at UNIV OF OREGON on October 10, 2016 









Table 3. Time * Condition Analysis With Fall-to-Spring Gains in Mathematics. 



Co' 



On' 

LO 


— 




NO 

LO 

r^. 


—_ 


—; 









— 

cL 

—I 





* 

* 

00 

* 

* 

* 

f 

LO 

o 

NO 

00 

o 

-X- 

* 

ON 

* 

* 

* 

ON 

^r 

d 

-x- 

& 

ro 

04 

-x- 

-x- 

* 

ON 

-X- 

* 

-x- 

ON 

00 

p, 

o 

o 

V 

04 

3.1 1 

.21 

ro 

00 


LO 


LO 


LO 



— 

Lri 


NO 


ON 

— 

00 





vo oi o 
o. lo o r*^ 
do do 

* * — * 
-x- -x- oo -x- 

I I o m 
no lo t+- 
^ P oi 
vo 


LO o oo on 

N - ro (X) 
oi rd rd rd 


* -X- 

O ro 
00 ro 

vd 

— ''f 


ro -X- 

00 

1^ 


00 LO (N 
T}- T}- lo v£> 
o o o o 

* * o. * 

I I o 

& ^ ° 

00 — — 


O VO ro 

LO O 04 
— — 


Oj — 
vO LO 

oo 


(N N O' 
O O' t 
i< K ^r 
ro t ro 

ro * * 

-it 

04 * * 

vO vO O 
O LO 

rd oi 
VO LO 
04 ro 


ro oi O 
O ro O. 
— — O 


* 

00 

04 

l< 


04 

On 

-r 

d 

no oo ^r 

ro O O' LO 

— O O' nO ro 

O ro 00 


ro — ro 



X—V x—v 


O 

O 

o 

— 

ro 

00 — 

ro 

o. 

LO 

— 

r 4 - 

00 

— ro 

CO 

LO 

l< 

vd 

d 

— 

04 oi 

04 

— 

— 

— 

- 




'—^ 



ON 

* 

f 

f 04 

t NO 

ro 

* 

it 

* 

t 

* 

it 

ON 

o 

-A- 

* 

4r 

* — 

rd 

7P 

o. 

* 

-A- 

* 

d 

o 

ro 


ON 

00 

ON 


04 




o. 


00 

rd 


ro 

LO 

LO 

oi 

ON 

LO 

— 

04 



ON 

LO 


£ in 2 

04 2l 04 LO 
04 fO 00 
O 


vO 

00 

O 


— O 
r-: P 


o o 

OI ON 


vO 


^r 

vO 

O 


o 04 
LA OI t 
— o on o* o- 

q p LO 04 

1 — 1 oi 

oo 

on 

ro 



O 

V* 

n3 


C txo 

o s u a | 

CL “O v bO 

~ — X ~0 CJ 

ctj »- 1_ u> CU 04 > 

L U Q. c: £ HI -Q. ^0 ^ 


"U 

o « 

O — 

■Q. 


E £ 

0) n 

-C Q_ 

l * ' 

Z o 


> Q_ 

■ ° % 
- L- U 


£ £ 

A CD 


— C 

o <u 

04 "O 


CD _ 

if u 


U 04 


u X 

04 O 

“S ° 


X ^ 
d= _l 

*0 c 

v, 2 

S .« 

OJ TJ 


Oo « s 


e -t! 
-3 -S 
Z o 
>> O 


o 

E | 

-§ 8 

CD 

(D CU 


o .E 

00 C/) 


■Eg S’ 

rf >> 


CD si 
U 04 
^ £ 

2 -7 
Q_ Z 

c-» 

c 

CD 

X 

3 

LO O 


bO .2 

■S^ 


** X E 
XI 04 O 

S 2 ■? 


CD — ^ 

I 3 U c+_ 

1 -= c O 

> “* CD 

X CD 
c. s_ 


II < 

LO LO 

o 

oi 


0- 

lo 

< 


«—• —7 

o Z 

04 LU 
rt‘ ^ 


i i 11 

11 :i g 

L_ X 0 
! ^ C 

- o X3 
1/1 11 c 
« 

X LO O 
CD I— OI 

I O d 

L- O 0 

£ 2 a 

8 .2 'c 
x a 2 

•Si . 

$ E “ 3 
■8 S ” 8 
-5 .£ -S ° 


OJ Ft" 

c o ._ . 

-S S g ® S v 

Sf . 

“s 8 


CD CO 

s s? 

LO 13 


s_ -a 

CD £ 

_Q .E 

E W 

Z x 

CD 

» X 
CO L. 
LO ro, 

z 


c ^ ^ I 

2 -5 .2 ^ 
a >r S V 
_ rt rt „ 
X C X P- 
b CD t 

-5 M- u| 

C o s • 

ffl D L ” 

K c « 


*-» 04 

> S 

CD c3 


CD V 
r -Cl 

i 


o In 
Z < 


xS»gd 

= E n ^ v 


104 


Downloaded from ecx.sagepub.com at UN1V OF OREGON on October 10, 2016 





Doabler et al. 


105 


(g = 0.31), and 6.50 for the RAENS (g = 1.08). 
The analyses-of-covariance model estimated 
differences in covariate-adjusted SESAT scores 
between conditions of 8.95 (g = 0.24). 

Discussion 

This study conducted a closely aligned con¬ 
ceptual replication to further investigate the 
efficacy of the ROOTS intervention program 
1 year after the initial study. Participating kin¬ 
dergarten classrooms in the current study 
were from a different geographical region and 
offered a different instructional context for the 
counterfactual than the original study. Over¬ 
all, statistically significant effects on the 
TEMA-3, ASPENS, and RAENS were repli¬ 
cated in the Boston study. The effect sizes 
obtained for these measures ranged from 0.31 
to 1.08. In the Oregon study, Clarke and col¬ 
leagues (2016) failed to find statistically sig¬ 
nificant differences on the NSB or the SESAT. 
Notably, however, results from the Boston 
study indicated a statistically significant 
impact on these two distal outcome measures. 
Effect sizes for the NSB and the SESAT, 
respectively, were .40 and .24. With exception 
of the RAENS, all effect sizes fell within the 
confidence intervals of the Oregon study. The 
effect size for the Boston RAENS exceeded 
the upper bound of the effect obtained in the 
Oregon study. 

These findings are noteworthy because the 
replication’s control condition, relative to the 
Oregon study, included programs that had a 
stronger evidentiary basis. In fact, the two 
primary core programs (enVisionMATH and 
Scott Foresman) were reviewed by the WWC 
(n.d.) and rated as effective for improving 
student mathematics achievement. These dif¬ 
ferences in instructional dimensions (i.e., 
mathematics programs) were deemed as 
“raising the bar” of the counterfactual for the 
current study. The results are also important 
because the Boston study included a more 
ethnically diverse student sample. Demo¬ 
graphic data indicated a significant increase 
(18%) of students whose reported ethnicity 
was Hispanic. By diversifying the sample, 
effects of ROOTS may be generalized to a 
broader range of students. 


One plausible explanation for the statistically 
significant differences between the ROOTS and 
control conditions in the replication study is the 
timing of the intervention. Onset of ROOTS in 
the Boston study occurred at a different time 
point in the school year than the Oregon study. 
Whereas students in the initial study received 
ROOTS starting in mid-January, the replication 
began nearly 2 months earlier (mid-November). 
Based on the foundational concepts and skills 
addressed in ROOTS, it is plausible that the 
intervention has more beneficial effect on math¬ 
ematics outcomes when delivered earlier in stu¬ 
dents’ kindergarten year. 

To obtain a broader perspective of 
replication, researchers may need 
to consider whether there is a 
“decline effect” with students ’ 
response to treatment. 

In addition to our formal investigation of 
treatment effects, we were also interested in 
exploring whether rates of treatment respon¬ 
siveness among ROOTS students were simi¬ 
lar across the Oregon and Boston studies. 
Although this was not formally tested, we 
found similar percentages of ROOTS stu¬ 
dents from the Oregon and Boston studies 
tested below the 10th percentile on the 
TEMA-3 at both pretest (34% and 29%) and 
posttest (17% and 16%). Our interest in these 
percentages is based on the idea of invoking 
the field to bring treatment responsiveness 
into the replication conversation. To obtain a 
broader perspective of replication, research¬ 
ers may need to consider whether there is a 
“decline effect” (Cook, 2014) with students’ 
response to treatment. Rates of treatment 
response equal to or better than those 
obtained in earlier investigations may serve 
as an additional indicator of replication. A 
decline of treatment effects for at-risk sub¬ 
groups (e.g., English learners), however, 
may suggest a lack of replication. Moreover, 
it may indicate a shift or strengthening of the 
counterfactual (Lemons et al., 2014). Our 
informal explorations suggest similar per¬ 
centages of treatment responsiveness in the 
two ROOTS studies. 
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Implications for Practice and 
Research 

Findings from the current study align with the 
growing knowledge base of effective mathe¬ 
matics instruction that suggests high concen¬ 
trations of whole-number instruction delivered 
via systematic and explicit mathematics inter¬ 
ventions are crucial for students who struggle 
to develop mathematics proficiency (Bryant 
et al., 2011; Gersten et al., 2009; Sood & 
Jitendra, 2013). A profound understanding of 
whole numbers in the early grades has strong 
implications for a child’s success in later 
mathematics. Therefore, results obtained by 
our study and those found by previous math¬ 
ematics intervention research showcase the 
need to privilege the development of number 
sense among young students who face diffi¬ 
culties in mathematics. 

Rates of treatment response equal 
to or better than those obtained in 
earlier investigations may serve as 
an additional indicator of 
replication. 

We also believe the current study has 
implications for future replication research. 
At a broad level, this study supports the cru¬ 
cial role replication has in educational 
research. Replication studies, when conceptu¬ 
alized and implemented well, are valuable for 
ruling out chance findings obtained in previ¬ 
ous research. We regard findings from the cur¬ 
rent study as replicated evidence of the 
ROOTS intervention’s impact on student out¬ 
comes. As recommended by Coyne et al. 
(2016), a logical next step for our research on 
ROOTS would be to conduct a series of 
increasingly distal replication studies. Such 
replication studies will enable greater general¬ 
ization regarding ROOTS’ efficacy and also 
allow us to systematically explore various 
components of the intervention theorized to 
influence student outcomes. 

For example, roughly one fifth of ROOTS 
students remained in what could be described 
as a severe risk category (i.e., below the 
10th percentile on the TEMA-3 at posttest). 


Persistent findings related to low response or 
nonresponse to generally effective interven¬ 
tions continue to challenge the field of aca¬ 
demic intervention research (Fuchs, Fuchs, & 
Compton, 2012). The findings of the current 
study fit this pattern. Therefore, having begun to 
establish overall intervention efficacy, a next 
series of studies might investigate whether 
complementary features to the ROOTS inter¬ 
vention, such as components that target students’ 
cognitive processes (e.g., attention), bolster 
treatment impact and improve student response. 

This study also adds support for situating 
replication studies within larger frameworks 
of research. For example, given that the dura¬ 
tion of Goal 3 efficacy projects funded by the 
Institute of Education Sciences (IES) is typi¬ 
cally 4 years, it is possible that researchers can 
conduct a series of a priori replication studies 
within these large-scale projects. When con¬ 
ceptualizing the larger, IES-funded ROOTS 
efficacy project (Clarke, Doabler, Fien, Baker, 
& Smolkowski, 2012), we decided to ground 
it in a systematic framework of replication 
(Coyne et al., 2016). Our aim was to accumu¬ 
late converging evidence in support of the 
intervention across four separate cohorts of 
kindergarten students from two different geo¬ 
graphical regions. We acknowledge that when 
planning for replication studies in large-scale 
efficacy projects, factors such as statistical 
power and available resources must be taken 
into full consideration. Flowever, if a series of 
replications is planned well, rigorous and 
trustworthy evidence in support of an inter¬ 
vention can be obtained within a relatively 
short time frame (Coyne et al., 2016). 

Finally, the recommendations for system¬ 
atic replication frameworks proposed by Coyne 
et al. (2016) yield a cogent argument that the 
field of special education needs to delineate 
standards for replication research. Fields such 
as prevention science have endorsed replica¬ 
tion standards for efficacy, effectiveness, and 
scale-up research (Flay et al., 2005; Gottfred- 
son et al., 2015). Of foremost importance, stan¬ 
dards in the field of special education would 
establish the basis for the design and imple¬ 
mentation of replication research. Moreover, 
such standards would increase consistency in 
the way researchers conceptualize, operationalize, 
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and disseminate replication studies. 
Currently, replication research in special edu¬ 
cation markedly differs on how it is classified 
for dissemination (Cook, Collins, Cook, & 
Cook, 2016; Lemons et al., 2016). 

Limitations 

Results from the current study are specific to 
the targeted sample. Therefore, additional repli¬ 
cations should be conducted in different geo¬ 
graphical areas and with more diverse samples 
to increase confidence in the generalizability of 
findings. Additionally, the partially nested 
design used for this study has weakened inter¬ 
nal validity compared to fully clustered or 
unclustered RCTs. Although the randomly 
assigned students represent potential outcomes 
for each other in all background characteristics 
(Rubin, 1974, 2005), the two groups may differ 
in postassignment clustering effects (Bauer 
et al., 2008). We have neither a theoretical ratio¬ 
nale nor empirical evidence that simply cluster¬ 
ing kindergarten students in small groups would 
lead to improved math outcomes, but we cannot 
rule out such an effect in the present study 
because the difference in clustering across con¬ 
ditions has been confounded with the interven¬ 
tion delivery. On the other hand, the external 
validity may be stronger. In our schools, stu¬ 
dents at risk for math difficulties typically 
receive no supplemental instruction in kinder¬ 
garten, so the ungrouped control students likely 
represent the most appropriate contrast condi¬ 
tion. The control group experience for this study 
was chosen to represent most classrooms in the 
states where the project took place. That said, a 
limitation in the present study is that ROOTS 
students received substantially more mathemat¬ 
ics instruction than their control peers. This 
additional time may explain some of the 
improvement. Additionally, because we simul¬ 
taneously varied three important elements (i.e., 
location, timing of intervention, and instruc¬ 
tional context of the counterfactual), it is diffi¬ 
cult to determine which element may account 
for the differences in findings between the cur¬ 
rent study and the original investigation. 

The possibility of unintentional bias or 
potential repetition of mistakes in study logic 


or procedures because of author overlap is an 
additional limitation (Makel & Plucker, 
2014). Author overlap occurs when the same 
research team is responsible for and carries 
out replications of their original research 
(Coyne et al., 2016). The same research team 
conducted the original and current investiga¬ 
tions of ROOTS. Concerns with author over¬ 
lap in the current study are largely mitigated 
given that an independent evaluator conducted 
the statistical analysis and an external entity 
from the Boston metropolitan area collected 
all student achievement and direct observa¬ 
tion data. It is important to note that our 
research team oversaw these data collection 
efforts. Additionally, both ROOTS RCTs used 
school-based personnel to implement the 
intervention. We believe this approach more 
closely resembles the conditions studied in 
large-scale evaluation projects. Efficacy trials 
typically have research team members imple¬ 
ment the intervention in order to provide a 
more controlled “ideal” setting and maximize 
treatment effects. Thus, our estimates of inter¬ 
vention impact may be closer in approxima¬ 
tion to routine classroom conditions. 

An additional limitation to the findings 
may be the possibility of obtaining a novelty 
effect wherein the impact we observed was 
due to the uniqueness or novelty of the exper¬ 
iment and intervention for the research sites. 
Neither Boston school district had previously 
used a mathematics intervention in kindergar¬ 
ten. Thus it may be that impact resulted from 
the newness and excitement related to imple¬ 
menting a mathematics intervention, resulting 
in changes to other practices (e.g., renewed 
focus on mathematics in core instruction) that 
would have translated into positive student 
outcomes. The plausibility of the novelty 
effect is lessened by the similar results found 
across studies and could be further attenuated 
by additional replications in sites that have 
previous experience in implementing mathe¬ 
matics interventions at the kindergarten level. 

Conclusion 

Although there is consensus among the 
research community on the importance of 
replication, recent reviews of the special 
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education research literature suggest that of 
the studies that demonstrate features of repli¬ 
cation, few use identifiers such as replicate or 
replication (Cook et ah, 2016; Lemons et al., 
2016). Consequently, this may represent a 
missed opportunity for the field of special 
education to systematically couch a contin¬ 
uum of intervention research within a larger 
replication framework (Coyne et ah, 2016). In 
this study, we sought to replicate the benefi¬ 
cial treatment effects of a Tier 2 mathematics 
intervention on the mathematics outcomes of 
students with MD using the same method¬ 
ological and analytical procedures applied 
in the initial study. Overall, we found that 
intervention students outperformed their con¬ 
trol peers on all student outcome measures. At 
a molecular level, these results suggest that 
the treatment effects of ROOTS replicated in 
classrooms from a different geographical area 
and with a different instructional context. 
More globally, we hope the integration of sys¬ 
tematic replication into our line of research 
provides further support for the importance of 
replication in advancing educational research 
and practice. 
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